JCapper Message Board

JCapper 101
-- UDM help - similar distance and similar surface

Home	Register Log In

UDM help - similar distance and similar surface

jeff
2/4/2016
11:27:24 AM

--quote:

"Could use some help. I want to make some UDMs --- 1st one, any race that is similar in distance and similar in surface to the most recent start where the horse ran 1st, 2nd, or 3rd in the most recent start."

--end quote

Let's cover the needed data fields in the JCapper StarterHistory Table Schema Doc first.

Similar in distance...

The two data fields that enable you to define distance today and distance last start are DIST and DISTLASTOUT - and they are listed in the JCapper StarterHistory Table Schema Doc as follows:

Field Name Data Type Description
--------------------- ------------- ----------------------------------------------
dist number single distance in yards today’s race
distlastout number single distance in yards most recent start

Similar in surface...

The data fields that enable you to define surface today and (from there) get a handle on relationship to surface last start are INTSURFACE, SURFACE, and SFSHIFT - and they are listed in the JCapper StarterHistory Table Schema Doc as follows:

Field Name Data Type Description
--------------------- ------------- ----------------------------------------------
surface text 1 surface type (D, d, O, T, t)
intsurface number integer surface type (1=D, 2=d, 3=O, 4=T, 5=t)
sfshift number integer surface shift

Horse ran 1st, 2nd, or 3rd (last time out)...

The data field that enables you to define finish position (last start) is POSFINCALL - and is listed in the JCapper StarterHistory Table Schema Doc as follows:

Field Name Data Type Description
--------------------- ------------- ----------------------------------------------
posfincall number integer position at finish call last start

Before diving in and creating sql expressions, let's talk about framework.

You used the terms "similar in distance" and "similar in surface." Both of these can mean different things to different players. Some players might insist on an exact match for both surface and distance. Others might insist on an exact match for surface, but be ok if distance today is within either half a furlong or within one furlong of distance last out. Still others might be ok if distance today falls within the same category as distance last out - such as sprint today with sprint last out or route today with route last out.

If I were creating these (or similar UDMs) for my own use - I'd decide on a framework first and from there make sure all UDMs of the type you are describing in your post adhere to whatever framework I decided to implement.

For example, one way to do it would be to break race distance and surface out into broad categories such as dirt sprint, dirt route, turf sprint, and turf route - and from there create a separate set of UDMs within each category.

Another way to do it (especially if similar distance means exact distance to your way of thinking) would be to break race distance and surface out into much tighter categories such as specific distance and surface for the tracks that you play.

For example: 5.5f AQU Inner... 6.0f AQU Inner... 8.0f AQU Inner... and 8.5f AQU Inner... and from there create separate UDMs within each category.

For example purposes - because the AQU Inner meet is currently underway - and because the more situational data queries I look at the more I am convinced there is more value to be had in analyzing the tendencies of specific track-surface-dist categories as opposed to using blended track-surface-dist categories: I am going to use 6.0f on the AQU Inner in the example sql expressions below.

Assume for the sake of argument that today's race is being run at 6.0f at AQU on the inner surface - and that I am writing UDMs for that.

Ok, let's dive into some sql expressions...

I'll start off with track code:

SELECT * FROM STARTERHISTORY WHERE TRACK = 'AQU'

Next, similar in distance...

AND DIST = 1320
AND DISTLASTOUT = 1320

The first line handles today's distance. The second line handles distance last start. Here I've defined both as 6.0f or 1320 yds.

Alternately, I could evaluate them mathematically using something like this:

AND ABS(DIST - DISTLASTOUT) = 0

Alternately, If similar distance to you means within a half furlong, you could use something like the following:

AND ABS(DIST - DISTLASTOUT) <= 0.5

Or if similar distance to you means within one furlong, you could do it like this:

AND ABS(DIST - DISTLASTOUT) <= 1

Note: In the above lines of sql I am evaluating the absolute value of the difference between distance today and distance last out. Doing it this way handles 'about' distances that show in the data as negative numbers.

Next, similar surface...

AND INTSURFACE = 2
AND SFSHIFT = 3

The first line of sql above defines today's surface as 2 or an inner dirt course. The second line of sql above defines surface shift as 3 which from the section on Surface Shift from the Supported Factors Page at JCapper.com means inner dirt course today and inner dirt course previous race.

Note: If you are writing a similar UDM for some other category... outer or main dirt course... inner turf course, etc. - use the appropriate surface shift number from the chart shown in section on Surface Shift from the Supported Factors Page at JCapper.com

Next, horse ran 1st, 2nd, or 3rd (last time out)...

AND POSFINCALL BETWEEN 1 AND 3

The above line of sql defines finish call last start as 1, 2, or 3.

Alternately you could do it this way:

AND POSFINCALL >= 1
AND POSFINCALL <= 3

One more consideration: Number of lifetime starts...

This type of UDM evaluates a horse's most recent start.

Whenever a UDM evaluates data points generated by a horse's previous starts - it is generally a good idea to add a constraint that eliminates first time starters.

This prevents errors that can arise if any of your evaluations are using math and encounter missing values (like the example above where I evaluated the absolute value of distance last out subtracted from distance today.)

In the example case above distance last out for first time starters is missing from the data.

Depending on your database driver mathematical evaluation of empty or missing values can result in errors.

One easy way to avoid that possibility is to eliminate first time starters from the results set returned by the UDM.

The following line of sql prevents first time starters from showing up in the results set:

AND STARTSLIFETIME > 0

Putting it all together...

SELECT * FROM STARTERHISTORY WHERE TRACK = 'AQU'
AND DIST = 1320
AND DISTLASTOUT = 1320
AND INTSURFACE = 2
AND SFSHIFT = 3
AND POSFINCALL BETWEEN 1 AND 3
AND STARTSLIFETIME > 0

The above sql expression can be used as a template for creating the type of UDM you are asking about. Adjust the constraints for track, surface, and distance to fit your needs.

-jp

.

~Edited by: jeff on: 2/4/2016 at: 11:27:24 AM~

jeff
2/3/2016
3:46:18 PM

--quote:

"2nd UDM--Any horse that is similar in distance and surface where the horse was 1st or 2nd at the first and/or second calls last out, even if it faded in the stretch"

--end quote

This UDM is similar to the first. If I were creating it for my own use it would have the same framework, the same data fields for track, surface, and distance, and the same elimination for first time starters that I used for the UDM in the above post.

The one new item for this UDM is this: the horse was 1st or 2nd at the first and/or second calls last out...

Looking at the JCapper StarterHistory Table Schema Doc - there is no data field for first call last out.

But there is a data field for second call last out.

Note: When I did the initial design for the data fields in the StarterHistory table I was faced with building the most bang for the buck roi-wise into the table while being limited to 255 data fields. Had I included a field for first call last out I would have had to eliminate one of the other fields in the table. After doing considerable Data Window research I decided that second call last out provided more bang for the buck roi-wise than first call last out - and decided to build that into the table rather than first call last out.

That said, the data field in the table for position at second call last out is POS2CALL - and it is listed in the JCapper StarterHistory Table Schema Doc as follows:

Field Name Data Type Description
--------------------- ------------- ----------------------------------------------
pos2call number integer position at second call last start

You can do something like the following to add 1st or 2nd at the second call last out into a sql expression:

AND POS2CALL BETWEEN 1 AND 2

Alternately, you could do it like this:

AND POS2CALL >= 1
AND POS2CALL <= 2

Putting it all together...

Using the same track, surface, distance, and elmination for first time starters as the UDM definition from my above post, the following sql expression can be used as a template for the type of UDM you are asking for in this post:

SELECT * FROM STARTERHISTORY WHERE TRACK = 'AQU'
AND DIST = 1320
AND DISTLASTOUT = 1320
AND INTSURFACE = 2
AND SFSHIFT = 3
AND POS2CALL BETWEEN 1 AND 2
AND STARTSLIFETIME > 0

-jp

.

bill2010
2/3/2016
4:07:03 PM

THANKS JEFF...

jeff
2/4/2016
4:56:01 AM

--quote:

"---3rd UDM, any race that is similar in distance and surface where the horse made a good middle move between the first and second call, where it gains position or lengths to be within 2 lengths of the leader or on the lead at the second call, even if it fades in the stretch."

--end quote

This UDM is similar to the UDMs in the above two posts. If I were creating it for my own use I would use the same framework, the same data fields for track, surface, and distance, and the same elimination for first time starters that I used for the UDMs in the above two posts.

The unique item being asked for in this UDM: the horse made a move between calls last out...

If the only acceptable move between calls is 1st call to 2nd call - then you'll have to find a way to do this outside of JCapper.

For reasons that I laid out in the above post, The JCapper StarterHistory Table Schema Doc - does not have data field for first call last out.

But there are data fields in the table for second call last out, stretch call last out, and finish call last out. They are listed in the JCapper StarterHistory Table Schema Doc as follows:

Field Name Data Type Description
--------------------- ------------- ----------------------------------------------
len2call number single beaten lengths at second call last start
pos2call number integer position at second call last start
lenstrcall number single beaten lengths at stretch call last start
posstrcall number integer position at stretch call last start
lenfincall number single beaten lengths at finish call last start
posfincall number integer position at finish call last start

It's possible to work any move between calls into a sql expression using the above data fields.

But to get it just right you first have to identify all of the possible cases you want to capture - and from there create sql to handle each case.

Let's identify each possible case where a horse makes an improvement between the 2nd call last out and the stretch call last out:

Case #1.
Horse was not the leader at either call - but made an improvement in position or beaten lengths between the 2nd call last out and the stretch call last out - to be within 2 lengths of the leader.

This can be worked into a sql expression as follows:

AND NOT LEN2CALL = 100
AND NOT LENSTRCALL = 100
AND POS2CALL BETWEEN 2 AND 20
AND POSSTRCALL BETWEEN 2 AND 20
AND (POSSTRCALL < POS2CALL OR LENSTRCALL < LEN2CALL)
AND LENSTRCALL <= 2

Notes:

The .JCP File Build routine is programmed to test for the presence of beaten lengths data in each running line. Whenever beaten lengths data is missing from a running line the routine is programmed to use 100 as the value for beaten lengths to indicate that beaten length data is missing for that running line.

The first two lines of sql eliminate cases where beaten lengths data for either call is missing from the last running line. 99% of the time when this happens it's because a horse was eased, pulled up, lost the rider, or clipped heels and fell, etc.

The next two lines of sql insure that the horse was not the leader at either call.

The fifth line of sql makes two of the evaluations needed for improvement. The first evaluation is for an improvement in position. The second evaluation is for improvement in beaten lengths.

Here, placement of the parenthesis characters is crucial.

In sql, multiple expressions embedded inside of parenthesis characters are always evaluated as a single expression. (This is true even if 100's of lines of sql are embedded inside of parenthesis.)

The last line of sql makes the final evaluation needed for improvement: "to be within 2 lengths of the leader."

Case #2.
Horse was not the leader at the 2nd call last out - but made an improvement in position between the 2nd call last out and the stretch call last out - to become the leader.

This can be worked into a sql expression as follows:

AND NOT LEN2CALL = 100
AND NOT LENSTRCALL = 100
AND POS2CALL BETWEEN 2 AND 20
AND POSSTRCALL = 1

Notes:

The first two lines of sql eliminate cases where beaten lengths data for either call is missing from the last running line. (See the notes beneath Case #1 above for a detailed explanation.)

The next line of sql insures that the horse had a valid position at the second call last out for the case being evaluated.

The final line of sql makes the improvement evaluation using deductive logic:

If a horse was not the leader at the second call last out (from the 3rd line of sql) but became the leader at the stretch call last out: Logic dictates that the required improvement took place.

Case #3.
Horse was the leader at both the 2nd call last out and the stretch call last out (improvement in position is not possible) but did make an improvement in beaten lengths between these two calls.

The this can be worked into a sql expression as follows:

AND POS2CALL = 1
AND POSSTRCALL = 1
AND NOT LEN2CALL = 100
AND NOT LENSTRCALL = 100
AND LENSTRCALL > LEN2CALL

Notes: The first two lines of sql eliminate horses that did not lead at both calls. (Leaving only those horses that were the race leader at both calls.)

The next two lines of sql eliminate cases where beaten lengths data for either call is missing from the last running line. (See the notes beneath Case #1 above for a detailed explanation.)

The last line of sql makes the evaluation that designates improvement: beaten lengths at the stretch call are greater than beaten lengths at the second call - i.e. the horse is further ahead at the stretch call than it was at the second call: Therefore the horse made the necessary improvement between calls.

Putting it all together....

At this point you have some options. You could create separate UDMs to handle each of the above three possible cases. Or alternately, you could embed the sql that handles each case inside of parenthesis characters and work sql for each case into a single UDM.

For example purposes, I'll stick with the same framework I used for the sql in the first two UDMs (see my above posts) and work each of the cases discussed in this post into a single UDM.

Let's start with the framework:

SELECT * FROM STARTERHISTORY WHERE TRACK = 'AQU'
AND DIST = 1320
AND DISTLASTOUT = 1320
AND INTSURFACE = 2
AND SFSHIFT = 3
AND STARTSLIFETIME > 0

Next, let's create a sql template for each of the possible cases:

AND ((CASE1)

 OR (CASE2)

 OR (CASE3))

Notes: Placement of parenthesis characters in the above sql template is crucial.

Remember that anything embedded inside of parenthesis characters, even 100's of individual lines, gets evaluated as a single expression.

From here, we can easily replace CASE1 in the above sql template with the actual sql posted above for Case #1.

And after doing the same for CASE2 and CASE3 - the sql template for the cases now looks like this:


AND ((NOT LEN2CALL = 100
    AND NOT LENSTRCALL = 100
    AND POS2CALL BETWEEN 2 AND 20
    AND POSSTRCALL BETWEEN 2 AND 20
    AND (POSSTRCALL < POS2CALL OR LENSTRCALL < LEN2CALL)
    AND LENSTRCALL <= 2)

OR (NOT LEN2CALL = 100
    AND NOT LENSTRCALL = 100
    AND POS2CALL BETWEEN 2 AND 20
    AND POSSTRCALL = 1)

OR (POS2CALL = 1
    AND POSSTRCALL = 1
    AND NOT LEN2CALL = 100
    AND NOT LENSTRCALL = 100
    AND LENSTRCALL > LEN2CALL))

From here it's a matter of appending the sql for the cases to the sql for the starting framework.

After doing that - here's what the final sql UDM Definition looks like:


SELECT * FROM STARTERHISTORY WHERE TRACK = 'AQU'
AND DIST = 1320
AND DISTLASTOUT = 1320
AND INTSURFACE = 2
AND SFSHIFT = 3
AND STARTSLIFETIME > 0

AND ((NOT LEN2CALL = 100
    AND NOT LENSTRCALL = 100
    AND POS2CALL BETWEEN 2 AND 20
    AND POSSTRCALL BETWEEN 2 AND 20
    AND (POSSTRCALL < POS2CALL OR LENSTRCALL < LEN2CALL)
    AND LENSTRCALL <= 2)

OR (NOT LEN2CALL = 100
    AND NOT LENSTRCALL = 100
    AND POS2CALL BETWEEN 2 AND 20
    AND POSSTRCALL = 1)

OR (POS2CALL = 1
    AND POSSTRCALL = 1
    AND NOT LEN2CALL = 100
    AND NOT LENSTRCALL = 100
    AND LENSTRCALL > LEN2CALL))

-jp

.

jeff
2/4/2016
10:07:33 AM

After giving it a look over on the morning after...

The 'final' expression can be cleaned up a bit by removing the lines of sql that reference beaten lengths equal to 100 from the template that handles the cases - and moving them to the starting framework. That way, they only appear once in the final expression instead of three times.

After doing that the final expression looks like this:


SELECT * FROM STARTERHISTORY WHERE TRACK = 'AQU'
AND DIST = 1320
AND DISTLASTOUT = 1320
AND INTSURFACE = 2
AND SFSHIFT = 3
AND STARTSLIFETIME > 0
AND NOT LEN2CALL = 100
AND NOT LENSTRCALL = 100

AND ((POS2CALL BETWEEN 2 AND 20
    AND POSSTRCALL BETWEEN 2 AND 20
    AND (POSSTRCALL < POS2CALL OR LENSTRCALL < LEN2CALL)
    AND LENSTRCALL <= 2)

OR (POS2CALL BETWEEN 2 AND 20
    AND POSSTRCALL = 1)

OR (POS2CALL = 1
    AND POSSTRCALL = 1
    AND LENSTRCALL > LEN2CALL))

-jp

.

jeff
2/4/2016
6:07:27 PM

--quote:

"--How would you use pace top or pace low with these UDMs? Thanks."

--end quote

I'd use PaceTop or PaceLow in these UDMs (or not use them) the same way I'd use PaceTop or PaceLow (or any factor for that matter) in my own UDMs.

I'll run the UDM through the Data Window with the data broken out by the factor I want to look at. I do that in the Data Window by selecting the factor I want to work with from the factors drop down - and then click the UDM button.

From there I'll evaluate the individual rows for the factor breakout data - and basically let the data tell me whether or not to add constraints for the factor I am looking at to the UDM.

Speaking in general terms - a high percentage of the time I am able to tell with just a glance at the data breakout rows whether or not it's worth my while to add new contraints for the factor I am looking at to the UDM definition.

When the data breakout rows contain mixed cutoffs for roi with acceptable win rate - and by that I mean sections of data among the middle rows with lower roi and win rate than at the bottom and top edges - experience has taught me that adding factor contraints for the factor I am looking at to the UDM definition is generally a bad idea. This is especially so when the low roi low win rate sections among the middle rows contain a significant number of starters. (Significant in this context meaning a significant percentage chunk of the total sample.)

But when the data breakout rows contain clear cutoffs for roi with acceptable win rate - experience has taught me that adding factor contraints for the factor I am looking at to the UDM definition is generally a good idea.

After adding new factor constraints to a UDM - experience has also taught me that it's almost always a good idea to record the date that the UDM definition was changed - and to evaluate the performance of the UDM going forward in time from that date.

That's the only real way to know whether or not adding factor constraints to a UDM (including PaceTop or PaceLow) was in fact a good idea.

Getting back to New Pace Top and New Pace Low. The data fields from the JCapper StarterHistory Table Schema Doc are PACETOP and PACELOW - and are defined in the JCapper StarterHistory Table Schema Doc as follows:

Field Name Data Type Description
--------------------- ------------- ----------------------------------------------
pacelow number integer new pace low
pacetop number integer new pace top

Using New Pace Top in a sql UDM...
Suppose the sake of argument that after running a UDM through the Data Window with the data broken out by New Pace Top - the data breakout suggested significant sub par performance (relative to the other data breakout rows) for horses that earned both a New Pace Top and a New Fig Top in their most recent start - similar to the data breakout row for PaceTop = 3 in this screenshot:
http://www.JCapper.com/Messageboard/Avatars/PaceTopIllustration.jpg

If that were the case I wouldn't hesitate to add something like the following line of sql to my UDM Definition:

AND NOT PACETOP = 3

But after doing so I'd want to evaluate performance of the revised UDM going forward in time for a bit before relying on it in a big way at the windows.

-jp

.

~Edited by: jeff on: 2/4/2016 at: 6:07:27 PM~

bill2010
2/8/2016
3:36:15 PM

THANKS JEFF....

JCapper Message Board

JCapper 101 -- UDM help - similar distance and similar surface

JCapper 101
-- UDM help - similar distance and similar surface